FF-STGCN: A usage pattern similarity based dual-network for bike-sharing demand prediction

Accurate bike-sharing demand prediction is crucial for bike allocation rebalancing and station planning. In bike-sharing systems, the bike borrowing and returning behavior exhibit strong spatio-temporal characteristics. Meanwhile, the bike-sharing demand is affected by the arbitrariness of user behavior, which makes the distribution of bikes unbalanced. These bring great challenges to bike-sharing demand prediction. In this study, a usage pattern similarity-based dual-network for bike-sharing demand prediction, called FF-STGCN, is proposed. Inter-station flow features and similar usage pattern features are fully considered. The model includes three modules: multi-scale spatio-temporal feature fusion module, bike usage pattern similarity learning module, and bike-sharing demand prediction module. In particular, we design a multi-scale spatio-temporal feature fusion module to address limitations in multi-scale spatio-temporal accuracy. Then, a bike usage pattern similarity learning module is constructed to capture the underlying correlated features among stations. Finally, we employ a dual network structure to integrate inter-station flow features and similar usage pattern features in the bike-sharing demand prediction module to realize the final prediction. Experiments on the Citi Bike dataset have demonstrated the effectiveness of our proposed model. The ablation experiments further confirm the indispensability of each module in the proposed model.


Introduction
The bike-sharing system represents an environmentally sustainable mode of transportation for short-distance urban travel, contributing to the reduction of carbon emissions and enhancing connectivity with public transit networks.Governments in cities such as New York, Washington, Beijing, and Shanghai are actively advocating for bike-sharing programs to mitigate traffic congestion.It is reported, during the first half of 2021, New York City witnessed a daily average bike-sharing usage of 5 million times.An effective allocation strategy enhances user experience and generates revenue.Conversely, poor, or imbalanced allocation strategies markedly diminish operational efficiency, raise dispatch costs, and lower user satisfaction.Bike-sharing demand prediction is a prerequisite for effective allocation strategies.It is analyzing the law of borrowing and returning bikes at stations to predict the bike-sharing demand soon.By capturing the dynamics of bike-sharing demand, operators can optimize allocation strategies, thereby creating an efficient, cost-effective, and user-friendly bike-sharing system.
Bike-sharing demand prediction models face several primary challenges: Initially, the complex spatio-temporal dependency is a major factor influencing the accuracy of bike-sharing demand prediction and concurrently reflects user travel patterns [1].Investigating the local spatio-temporal characteristics of a specific station within a bike-sharing system fails to comprehensively capture the overall user travel patterns of the entire system, resulting in a decline in predictive model performance.In the spatial dimension, bike-sharing demand between adjacent stations mutually influences each other, displaying similar user travel patterns.Similarly, stations within the same functional zone may also exhibit comparable user travel patterns [2].In the temporal dimension, bike-sharing demand exhibits continuous features.Therefore, the analysis of user travel patterns and the learning of spatio-temporal features pose significant challenges for bike-sharing demand prediction.
Furthermore, bike usage trends in bike-sharing demand are influenced by various factors [3].These trends exhibit both randomness and dynamism, making prediction a highly complex task.Factors affecting bike-sharing demand prediction can be categorized into internal and external factors.Internally, short-term (continuity) and long-term (daily and weekly periodicity) aspects impact bike-sharing demand prediction.Users' borrowing and returning behaviors between bike-sharing stations contribute to the time delay in bike-sharing demand.Externally, the POI and morning and evening peak hours significantly impact variations in bike-sharing demand.For example, stations near residential areas experience increased demand during morning peak hours when public travel rises.Therefore, the analysis of the intrinsic spatio-temporal features of bike-sharing enables the uncovering of global spatio-temporal patterns in demand.Additionally, introducing external influencing factors helps excavate hidden correlations between stations.This understanding facilitates an accurate prediction of bike-sharing demand by comprehending user travel patterns.
To address these challenges, this study proposes a spatio-temporal bike-sharing demand prediction model (FF-STGCN) based on usage pattern similarity analysis.The model captures correlated features among stations, mitigating the limitations in multi-scale spatio-temporal accuracy.The model adopts the idea of feature integration, constructing a multi-scale spatiotemporal feature fusion module based on a multi-scale feature attention (MS-FA) network and an attention-based feature fusion network.This approach minimizes the loss of multi-scale spatio-temporal features.Subsequently, a bike usage pattern similarity learning module is developed, utilizing temporal and spatial similarity calculators to capture underlying correlated features among stations.In conclusion, the proposed bike-sharing demand prediction model employs a dual network structure containing a flow-based feature learner (FFL) and a pattern-based feature learner (PFL), aggregated to enhance bike-sharing demand accuracy.

Literature review
Accurately predicting the bike-sharing demand is a crucial foundation for ensuring the effective operation and management of bike-sharing systems, and it has garnered widespread attention in recent years.Bike-sharing demand prediction can be categorized into two methods based on the prediction task: cluster-based and station-based.
The cluster-based prediction methods utilize clustering algorithms such as hierarchical clustering [4,5], Gaussian mixture model, supervised clustering [6], and community detection algorithms [7][8][9][10], etc.By analyzing different indicators, these methods reveal the correlations between stations to achieve the prediction of bike-sharing demand.For example, to capture the connections between bike-sharing stations, Wang et al. designed a two-tier fuzzy C-means clustering algorithm.This algorithm clusters bike-sharing stations into groups by combining the geographic location information of the stations and the migration trends of bikes between them.Subsequently, they integrated a multi-similarity reference model to predict the demand for bike-sharing within each group [11].Gu et al. have proposed an interpretable bike flow prediction (IBFP) method.This approach involves dividing the city into regions based on flow density and utilizing subspace clustering to group these regions, constructing interpretable patterns for bike-sharing flow.Subsequently, the method models spatio-temporal interactions using graph regularized sparse representation to predict bike-sharing flow patterns [12].However, these methods lack solutions to address the complex iterative problems, leading to instability in trend iterations during clustering.Therefore, some researchers have proposed iterative optimization solutions to tackle this issue.For instance, Zhao et al. introduced a hyper-clustering algorithm designed to capture mobility trends among individuals and clusters, enhancing the spatio-temporal neural network for demand prediction in bike-sharing systems [13].Existing research has thoroughly demonstrated the effectiveness and accuracy of cluster-based prediction methods.However, these methods rely on random initialization or manual parameter setting, leading to potential uncertainty in the resulting clustering outcomes.Furthermore, cluster-based approaches inadequately consider variations in demand among individual sites within the clusters, potentially limiting their ability to accurately predict bike-sharing demand.
In station-based prediction methods, the studies effectively predict bike-sharing demand at each station through the construction of a network model.Researchers employ machine learning to analyze historical data, discern patterns within it, and project future demand.Harikrishnakumar et al., for instance, introduced a method utilizing the Quantum Bayesian Network (QBN) framework for real-time analysis of bike-sharing demand, aiming to enhance both computational efficiency and accuracy [14].However, it's worth noting that bike-sharing demand predictions relying on machine learning typically necessitate a substantial amount of data, and the presence of incomplete or inaccurate data may result in a decline in accuracy.To address this, time series analysis models have been implemented in the prediction of bike-sharing demand.For instance, Leem et al. proposed a two-stage time series prediction model based on online learning to tackle the challenge of low prediction accuracy in environments with limited data and computational resources [15].This model attains higher accuracy with fewer computational infrastructures.Additionally, the ARIMA model and its variants employ autoregressive or moving average models to capture the temporal autocorrelation of data [16][17][18].Meanwhile, Cortez-Ordoñez et al. evaluated the significant distinctions among bike-sharing systems with diverse scales, characteristics, or usage patterns.They also conducted a detailed analysis of the performance of existing predictive algorithms, including ARIMA, Linear Models, and others, in each scenario [19].Developments in Deep Learning have led to the widespread use of various deep learning models to extract spatio-temporal correlations for predicting bike-sharing demand, such as Convolutional Neural Network (CNN) [20], Long Short-Term Memory (LSTM) [21], and Recurrent Neural Network (RNN) with its variants [22,23].For instance, Li et al. used feature engineering techniques to enhance the data, and then employed LSTM to capture the spatio-temporal dependence of the historical data and make predictions [24].And Chen et al. proposed a model for predicting bike-sharing demand that integrates Discrete Wavelet Transform (DWT), Autoregressive Integrated Moving Average (ARIMA), and Long Short-Term Memory neural network (LSTM).In detail, they decomposed the demand sequence into three high-frequency components and one low-frequency component using DWT.Subsequently, ARIMA and LSTM were applied for individual predictions.Lastly, the predicted results underwent reconstruction through DWT to establish the final prediction structure [25].Furthermore, many scholars believe that combining different models can improve the accuracy of demand prediction for bike-sharing.Specifically, Bai et al. use a cascade graph convolutional recurrent neural network to extract spatio-temporal correlations between data and two multilayer LSTM networks to represent external meteorological data and time meta separately [26].Chai et al. produce a multi-view spatio-temporal framework to combine characteristics into one prediction model framework of predicting the bike-sharing demand [27].Alternatively, some scholars have integrated GCN and attention mechanisms in a natural way to tackle the issue of incorporating irrelevant stations' features in the prediction process because of inadequate or erroneous prior knowledge [28][29][30][31].For example, Huang et al. developed the Temporal Multigraph Convolutional (TMGCN) network to capture the spatial topologies contained in the dynamic OD graphs in terms of time and exploit the GAN structure to overcome the high sparsity of OD demands [32].Furthermore, considering both the data collected from the bikes themselves and the extended analysis data provides valuable insights for constructing a network to predict the demand for bike-sharing [33,34].Unfortunately, user travel behavior varies across time and space, resulting in cyclical and volatile changes in bike-sharing supply and demand [35,36].The stochastic fluctuations of individual stations can interfere with the feature extraction and pattern learning of overall demand.These models are insensitive to random fluctuations and struggle to handle complex bike-sharing datasets, which leads to low prediction accuracy.Therefore, it is essential to mitigate the stochastic volatility of bike-sharing demand to improve the prediction accuracy and robustness of the models.
However, most studies typically utilize independent modules, such as convolutional networks, recurrent neural networks, and their variants, to separately capture temporal and spatial dependencies.These studies capture dependencies between temporal and spatial factors in an ordered manner, but they don't fully consider the dynamic spatio-temporal dependencies of the system.As a result, they are unable to tackle the delay in bike-sharing demand caused by users' dynamic borrowing and returning behavior.Moreover, in practical systems, infrequent user travel between adjacent stations results in a low correlation between them.Conversely, distant stations may display analogous user usage patterns, indicating an implicit correlation.Consequently, extant studies predominantly emphasize localized effects, neglecting the overarching system dependency and the stochastic nature of user borrowing and returning behavior.This disregard contributes to a diminished accuracy in prediction.
To comprehensively consider the randomness, global dependency of the bike-sharing system, and user behavior patterns, we propose a bike-sharing demand prediction model based on the similarity of user usage patterns.

Problem definition
In this part, we define the mathematical symbols and provide detailed explanations of the problem at hand.Definition 1 (Inflow matrices) At the t th time slot, we define the bike-sharing inflow matrices as I t .I t 1;1 is the quantity of borrowing from station s j and returning to station s i during the t th time slot.
where P t denotes the trip in t th time slot P(O) and P(D) represent the borrowing and return stations of a trip P.
) is a trip borrowing from the station s i , and returning in another station except the station s i .|•| is the cardinality of a set.
Definition 3 (Station geographic characteristics) We construct the geographical features of the station s i by utilizing the number of POI types in the region to which the station belongs, denoted as P i = {p 1 , p 2 , . ..pM }.Here p m denotes the value of the class m interest POI vector.
Definition 4 (Station temporal sequences) We define the station spatial feature, based on the historical order data of bike-sharing, is X T i ¼ fx 1 i ; x 2 i ; . . .; x t i g.Problem (Prediction problem) We utilize the historical inflow and outflow features I T−1 = {I 0 , . .., I t−1 } and O T−1 = {O 0 , . .., O t−1 } until time slot t−1, as well as the station geographic characteristics of stations S, carry out supply and demand predictions for bike-sharing, i.e.
O t i;j , for any single station s during time period t.Can be shown Eq 5.
where F(•) denotes the prediction function of my model.

Methodology
Fig 1 illustrates the general structure of the proposed model, which consists of three modules: the multi-scale spatio-temporal feature fusion module, the bike usage pattern similarity learning module, and the bike-sharing demand prediction module.Specifically, the multi-scale spatio-temporal feature fusion module based on the idea of feature integration utilizes MS-FA network and the attention mechanism to address limitations in multi-scale spatio-temporal accuracy.Then, the concept of estimating similar demand is incorporated into the design of a bike usage pattern similarity learning module to obtain usage pattern information by capturing the underlying correlated features among stations.Finally, we develop a bike-sharing demand prediction module which use a dual network structure containing FFL and PFL to learn high-dimensional spatio-temporal features and similarity usage pattern features for realizing the final prediction.

Multi-scale spatio-temporal feature fusion
The demand for bike-sharing exhibits strong spatio-temporal characteristics.Intuitively, the demand is influenced by short-term dependencies on recent historical flow data, while also exhibiting daily periodicity dependencies (long-term dependencies).However, when modeling objects at different scales using existing methods, a series of pooling layers or other cross-layer operations can result in the loss of features.To mitigate this loss of multi-scale spatio-temporal features, we have designed a multi-scale spatio-temporal feature fusion module.This module helps to identify and utilize the periodicity of bike-sharing demand, thereby improving the accuracy of the prediction model.The multi-scale spatio-temporal feature fusion module consists of two parts: feature training and feature fusion.To identify and utilize the periodicity of bike-sharing demand, feature training develops a dual MS-FA network to train short-term and long-term features.Then, an attention-based feature fusion is designed to obtain high-dimensional spatio-temporal features by fusing multi-scale features.
Primarily, to consider both short-term and long-term dependencies systematically, we take temporal features pertaining to short-term and long-term considerations into the model inputs.The inflow and outflow matrices (I t and O t ) expanded into distinct entities, namely, short-term inflow and outflow matrices (I S and O S ), and long-term inflow and outflow matrices (I L and O L ), as depicted in Fig 2 .In this context, t represents the predicted time point, N denotes the number of predicted stations, c represents the number of consecutive time series for short-term dependencies, and d denotes the number of consecutive days for long-term dependencies.l c signifies the continuous temporal interval, while l d characterizes the daily temporal interval.The dependency for l c is delineated as follows: where T day denotes 24 hours of a whole day.
Feature training.Capturing the characteristics of bike-sharing demand across various time slots is crucial for enriching the bike-sharing demand features.Therefore, we have constructed a dual MS-FA network in the feature training process.This network includes both periodic MS-FA and interval MS-FA to train long-term and short-term flow matrices separately.The main structure of the MS-FA network is shown in Fig 3.
The underlying principle of MS-FA is that feature attention can be achieved at various scales by adjusting the size of the spatial pooling operation.Specifically, Global average pooling (GAP) and local channel context aggregator are used to capture the global and local contexts of the short-term inflow matrix separately.Then, the local context is added to the global context in the attention module for feature fusion purposes, enriching temporal information.Finally, 1 × 1 convolution kernels are applied on flow matrices to integrate the features at short-term and long-term.Among them, point-wise convolution (PWConv) is chosen as the local channel context aggregator, which utilizes point-wise channel interactions at each spatial location.
Therefore, we apply MS-FA on short-term inflow matrices to obtain the short-term inflow embedding: The learnable parameters are denoted as w i and b i .And the convolution operator is denoted as *.Then the ReLU activation function is σ(•).I S 0 is computed by: where MðI S Þ 2 R h�N�N is the attentional weight generated by MS-FA.� denotes the broadcasting addition.And � denotes the element-wise multiplication.The local channel context LðI S Þ 2 R h�N�N and global channel context gðI S Þ 2 R h�N�N are computed by: where the kernel sizes of PWConv 1 is h r � h � 1 � 1, it reduces the original input feature's channel count by 1 r .B(•) denotes Batch Normalization is used to accelerate feature convergence.δ(•) is the ReLU activation function.The kernel sizes of PWConv 2 is h � h r � 1 � 1, which is utilized for the purpose of reinstating the original number of channels' features.
Similarly, the long-term inflow ÎL 2 R N�N , short-term outflow ÔS 2 R N�N and long-term outflow ÔL 2 R N�N embeddings are computed similar from Eqs 7 to 10.
Feature fusion.Feature training is helpful to further learning characteristics of bike-sharing demand across various time slots.Meanwhile, the demand also exhibits limitations in multi-scale spatio-temporal accuracy due to a series of convolution operations that produce coarse-grained results.Hence, to overcome the potential constraints of bike-sharing demand and improve prediction performance, we have developed a feature fusion method that fuses both short-term and long-term features.
In feature fusion, we propose a fusion network that leverages the attention mechanism to combine short-term and long-term features, thereby extracting high-dimensional inflow features defined as follows: b S I and b L I are computed by: where the learnable parameter is ω i .Similarly, we have the high-dimensional outflow features defined as follows: where b S O and b L O are computed similarly to Eqs 12 and 13.Finally, to jointly consider the demand and supply features, we connect the outputs mentioned above: where k denotes the concatenation operation.The high-dimensional spatio-temporal feature is used as T 2 R 2�N�N .

Bike usage pattern similarity learning
In bike-sharing system, demand is influenced by a variety of dynamic factors, including the geographical environment and the unpredictable borrowing and returning behavior of users.These factors contribute to the random volatility of bike-sharing characteristics.However, stations with similar bike usage patterns can reflect the borrowing and returning records of other stations in the same category during the same period.Hence, identifying stations with similar bike usage patterns to reveal potential connections in bike-sharing and mitigate the impact of stochastic volatility on feature learning is crucial for improving predictive performance.
The bike usage pattern similarity learning module, which is applied to obtain similarity usage pattern features, is composed of three parts: temporal similarity calculator, spatial similarity calculator, and spatio-temporal similarity calculator.The temporal similarity calculator is leveraged, which uses a metric, namely Dynamic Time Warping (DTW) [37], to calculate the similarity of bike usage patterns between stations in the temporal dimension.Then, to calculate the similarity of bike usage patterns in the spatial dimension, we develop a spatial similarity calculator using the Pearson Correlation Coefficient [38].Finally, the spatio-temporal similarity calculator is introduced to fuse the similarity of spatial bike usage patterns and similarity of temporal bike usage patterns to construct the similarity usage pattern feature.
Temporal similarity calculator.The temporal features of bike-sharing demand are crucial dynamic factors in analyzing the usage patterns of bikes at stations.Over the same time period, the usage patterns of some bike-sharing stations may exhibit similarities.Identifying stations that have similar bike usage patterns over a period of time and using these stations to reflect the borrowing and returning records of other stations in the same category during the same period would be beneficial for developing bike-sharing demand prediction.The DTW algorithm overcoming the constraint of requiring time series to have the same length when applying Euclidean distance, it has been widely used in subsequent research to measure the similarity between time series.Therefore, we utilize the DTW algorithm to calculate the similarity values of time series between any two stations within the preceding k time steps of the prediction moment, measuring their similarity in temporal patterns.
For example, We assume that the time series data X T−t and Y T−t for two stations, at k time steps before the predicted moment, are denoted as follows: In the Dynamic Time Warping (DTW) algorithm, the initial step involves computing the distances between individual elements of the two time series to generate the cost matrix D, respectively: where dist(x i , y j ) represents the Euclidean distance between the nodes x i and y j .Next, in the Finally, by calculating the cumulative cost values, the similarity in usage patterns between two stations in the temporal dimension is measured to construct a time similarity matrix TSV, as shown in the following formula: Spatial similarity calculator.The geographical environment is a crucial spatial factor that is strongly correlated with user travel behavior and significantly influences bike-sharing demand prediction.In stations with similar geographical environment, people's travel times and destinations exhibit similarities.Hence, analyzing the similarity of the geographical environment between stations helps to identify stations with common characteristics in the spatial dimension, thereby enhancing the model's ability to capture stochastic features.Moreover, as the value of Pearson Correlation Coefficient approaches 1, the positive correlation between the two station temporal sequences increases, indicating that their usage patterns are more similar in the spatial dimension.Therefore, we have measured the spatial similarity usage patterns between stations by calculating the Pearson coefficient of the station geographic characteristics between stations.The specific formula is as follows: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X n i¼1 ðX i À � XÞ 2 s ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi X n i¼1 where X i and Y i denote the i th station geographic characteristics, respectively.� X and � Y denote the means of station geographic characteristics.Spatio-temporal similarity calculator.Taking into account the influence of temporalspatial factors on stations, we use spatio-temporal composite metrics to measure the similarity of bike usage patterns between stations.The similarity usage pattern features are depicted in Eq 24.where v denotes the matrix of similarity of spatio-temporal usage patterns of stations.ω 1 and ω 2 are learnable parameters.tsv(s i , s j ) denotes the similarity values of usage patterns between stations s i and s j in the temporal dimension.And r(s i , s j ) denotes the similarity values of usage patterns between stations s i and s j in the spatial dimension.

Bike-sharing demand prediction
In complex urban public transportation systems, bike-sharing demand exhibits a complex spatio-temporal correlation.Traditional bike-sharing demand prediction learns the connections of stations by aggregating information from their neighboring stations.However, the borrowing and returning behavior of users between stations establishes a hidden correlation between them, which conveys more information than the connections between neighboring stations.By aggregating information from this correlation to learn inter-station features, prediction performance can be significantly improved.Furthermore, the usage of bike-sharing at neighboring stations can exhibit considerable variations due to temporal fluctuations in travel behavior and the distinct characteristics of the built environment.On the other hand, even for stations that are geographically distant from one another, their bike-sharing usage patterns may exhibit remarkable similarity.If the correlations between similar stations can be accurately captured and effectively incorporated into the prediction model, it may significantly improve the accuracy and reliability of bike-sharing demand prediction.Thus, we employ a dual network structure to learn the hidden correlated features among stations based on their flow and usage patterns similarity.In Fig 8, we present the bike-sharing demand prediction module, which consists of two key components: flow-based feature learner and pattern-based feature learner.Then, the extracted features are utilized by the bike-sharing demand prediction to realize the final prediction.Flow-based feature learner.The borrowing and returning behaviors of users establish flow connections between stations.The greater the flow between stations, the stronger their interdependence.By aggregating the characteristics of stations with strong correlations through flow relationships, we can avoid introducing weak correlations, thus improving the performance and efficiency of prediction.However, existing conventional aggregation functions might not be suitable for capturing the flow characteristics of bike-sharing data.Therefore, we propose the FFL, which is designed to extract the spatio-temporal correlations between bike-sharing stations by analyzing the flow of bikes between them.
In the beginning, we use Î and Ô to construct flow graph.Specifically, at time t, it is expressed as G t = (N t , E t ).The node of graph is represented as N t i ¼ ðs i ; T t i Þ where T t i is the feature of station s i at time t.The edge of between s i and s j is E t i;j ¼ ðe t i;j ; w t i;j Þ.When j Îi;j À Ôi;j j > 0, e i,j = 1, conversely e i,j = 0.And the weight between s i and s j at time t is defined as w t i;j : where S is the number of bike station.Then, we develop a flow aggregator to improve the GNN.Specifically, F 0 ¼ fF 0 1 ; F 0 2 ; . . .; F 0 i ; . . .; F 0 n g is the initial high-dimensional spatio-temporal features, where F 0 i ¼ T i .And T i is shown in Eq 15.By utilizing the high-dimensional spatio-temporal features of stations that are highly correlated with station s i in terms of flow, we can update the highdimensional spatio-temporal features F k i of station s i : ; 8s j 2 @ðs i Þg.And in Eq 25, we provide the calculation method for w i,u .
Extracting temporal dependency in the graph using GRU.We F f i is used to show the final embedding of station s i in the flow-convoluted graph.
Pattern-based feature learner.Traditional prediction models often assume that neighboring stations are highly correlated.However, with temporal variations in user behavior and geographic characteristics of bike-sharing stations, the demand for bike-sharing at neighboring stations can vary significantly.On other hand, the non-neighboring stations exhibit greater similarity in their temporal and spatial usage patterns than neighboring stations.Aggregating the information of stations with similar usage patterns is more conducive to improving the accuracy of bike-sharing demand prediction models.Consequently, we develop a patternbased feature learner to learn the dependency of bike usage among similar usage pattern stations.
The pattern-based feature learner adopts a multi-layer irregular convolutional architecture to capture the characteristics of bike-sharing demand among stations based on the similarity usage pattern features.The output of the irregular convolution is fed into a GRU.The aim is to extract the temporal correlation in bike-sharing demand.In this case, the irregular convolutional network structure is shown in Fig 9.
For each central station in the network, we identify the top k − 1 stations based on similarity usage pattern features, which then undergo convolution with the stations.This involves irregular convolutional computation: where C in denotes the number of channels in input T. S denotes the number of convolutional kernels.N s c ði; uÞ denotes the neighbors with similar bike usage patterns to the central unit s i in channel c. w s c denotes the weight in the convolutional kernel corresponding to the neighbor N s c ði; uÞ.b i, u is the learnable parameter.The F s i use to denote the final embedding of station s i in the network.Demand prediction.To put it simply, this aims to consider both the impact of bike flow and bike usage pattern similarity on a station.We concatenate F f i and F s i : where k is the concatenating operation.And F i is the finally embedding of station s i .And then, we feed the embedding F i of station s i to a FC layer for predicting the demand and supply of individual station at time t: where b x t i and b y t i are the prediction results of station s i demand and supply at time t, respectively.And o t i is learnable parameter.

Dataset
In subsequent experiments, we will use real-world datasets: BikeNY (bike order records collected by New York City) and POINY (points of interest in New York City).The details of these two datasets are as follows: BikeNY: It contains daily bike-sharing trip records for 120 stations from July 1, 2013 to February 28, 2014 in New York City.Each order record mainly consists of the pick-up or drop-off start time, pick-up or drop-off end time, station name, station longitude, station latitude, and other information.Data collected from July 1 to December 17 as the training set, data collected from December 18 to January 11, 2014 as the validation set, and data collected from January 12 to February 28 as the test set.
POINY: It contains valuable information about various points of interest, such as their categorization, longitude and latitude, as well as their respective names, from the New York City government.

Experiment setup
We build the proposed model on PyTorch, supported by the Python library.Concurrently to prevent the model from being overly focused on variations in features during its learning process, which may lead to inaccurate feature representation and the emergence of the gradient explosion issue, we utilize the z-score [39] technique to normalize the single-vehicle data constructed for experimentation, as depicted in Eq 31.
where μ denotes the mean.And σ denotes the standard deviation.
For the other hyperparameters in the model, we set the time interval to 15 minutes or 30 minutes and the length of daily cycle to 96 or 48.As for the time dimensions of the input short-term inflow/outflow matrix and long-term inflow/outflow matrix, we set c to 12 (representing 12 consecutive time intervals) and d to 7 (representing a continuous 7 days).Furthermore, as the user usage patterns of the stations are influenced by the building environment within a 150-meter radius, we use the count of 8 POI entity classes within a 150-meter radius of each station to construct the geographical features of the stations.

Evaluation measurement
Moreover, in model comparison, two commonly used indicators were used to evaluate the predictive performance of bike-sharing demand: Mean Squared Error (MSE) and Mean Absolute Error (MAE).These two indicators are widely used in the field of predictive modeling to measure the accuracy of predictions.MSE measures the average squared difference between the predicted and actual values, while MAE measures the average absolute difference between the predicted and actual values.Both indicators provide valuable information about the performance of a predictive model.Their equations are as follows: where N denotes the number of stations.ŷi in and ŷi out denote the predictable demand and supply.y i in and y i out denote the actual value of demand and supply.

Benchmark models
Five benchmark models are adopted for performance comparison with FF-STGCN, including one Time series model (LSTM) and four Graph Convolutional models (STGCN, MSTGCN, STSGCN and MC_STGCN): LSTM: LSTM can capture the temporal dependency for both short-term and long-term of time by introducing gate theory [26].
MC_STGCN: Its adepts a graph convolution network, based on the Louvain algorithm, effectively captures the regional spatio-temporal dependency [8].
STSGCN: It uses a synchronous graph convolution network to capture the spatio-temporal dependency at the complex local environments [40].
ST-GCN: It is combines graph convolutional networks and temporal convolutional networks to analyze spatio-temporal data [41].
MSTGCN: It adeptly embodies the intricate spatio-temporal characteristics of the data by leveraging non-Euclidean spatial graphs.And it captures spatio-temporal dependency by multi-graph convolution and context-gated recurrent neural networks [42].

Performance comparison
We present a comparison of the overall accuracy achieved by our proposed model with that of several baseline models.Table 1 shows a comparison of the predicted errors of benchmark models and our model for predicting bike-sharing demand on the BikeNY dataset.
Generally, our proposed model consistently outperforms other benchmark models when it comes to prediction accuracy across the 15-minute and 30-minute time slots in the BikeNY dataset.Notably, our proposed model achieves a significant reduction of MSE and MAE by 31.29% and 9.53%, compared to the model with the outperformance in five benchmark models at 30-minute time slot.Furthermore, within a 15-minute time slot, compared to the best-performing model among the five benchmark models, our proposed model significantly reduced MSE and MAE by 9.72% and 2.58%, respectively.
The results indicate that the hybrid model that couples GCN and LSTM model to learn features of bike-sharing demand outperforms LSTM in all time intervals.This is due to the fact that LSTM is a typical time series approach and it cannot exploit the spatial dependency of bike-sharing demand among stations for prediction.And it illustrates that capturing the spatial dependencies between stations facilitates improved prediction performance.
STSGCN has further improvement than ST-GCN.This result could be due to the fact that STSGCN captures heterogeneity in local spatio-temporal maps.MSTGCN has a considerable improvement over STSGCN which demonstrates the importance of considering the usage pattern similarity among stations and the effectiveness of multi-structural network for capturing the spatial correlation.Nevertheless, MSTGCN is inclined to focus on dependency from neighboring stations, it cannot sufficiently consider the correlation on distant stations which have similar usage pattern.Consequently, although MSTGCN achieves good prediction results, the performance of our proposed model is still better than it in all indicators.
The prediction accuracy of our model is much higher than MC_STGCN across two time slots.This is because, compared to MC_STGCN, our proposed model encodes the long-term correlation through the fusion of short-term and long-term flow features, rather than combining them into a singular feature vector.Such results imply that the fusion of short-term and long-term traffic features is more effective than combining them for predicting bike-sharing usage.
Our proposed model designs network structures that are specific to different characterisations, allowing for more effective learning compared to using a single network structure.Moreover, we replace the spatial neighbors with semantic neighbors in irregular convolutions.Thus, the results of compared with MSTGCN and STSGCN, among 30-minute time slot, our approach achieves an improvement of MSE and MAE by 31.29% and 39.39%.

Performance of models at stations with different bike usage levels
Satisfying users' travel needs is one of the important tasks in the bike-sharing system.However, the bike-sharing demand is not evenly distributed in urban areas which results in a discrepancy in the quantity of bikes present at various demand stations.Therefore, if the demand and supply in stations with different usage patterns can be precisely predicted, it is helpful for the bike-sharing rebalancing problem.
We categorize the stations into four levels based on the hourly bike usage in the New York City bike-sharing system to evaluate the models' performance across these levels within the 30-minute time slot [37].Specifically, stations with an hourly demand in the range (106, 141] are classified as high-demand level (grade 1); those with demand in the range (85, 106] are considered as moderately high-demand level (grade 2); those with demand in the range (59, 85] fall into the moderately low-demand level (grade 3); and stations with demand in the range (0, 59] are labeled as low-demand level (grade 4).Experimental results demonstrate that our model outperforms four benchmark models at stations with different bike usage levels.In particular, at stations with low demand, the predicted error of our proposed model is lower than other benchmark models.However, at stations with high demand, the performance of our proposed model is similar to that of LSTM.In addition, our proposed model still outperforms other benchmark models at other orders of magnitude of bike usage rates.Overall, FF-STGCN achieves better performance at stations with different bike usage levels.

Performance of models at stations during peak hours
Satisfying users' travel needs during morning and evening peak hours is important.This is because many users choose to ride bikes during these time periods to solve the last mile problem.Therefore, accurate prediction of bike-sharing demand can help operators develop appropriate rebalancing options to meet subscriber needs at the minimum cost.For this reason, in our experiment, we predict the supply and demand of bike-sharing during morning peak hours (from 6:30 am to 10:00 am) and evening peak hours (from 5:00 pm to 8:00 pm) separately.We use MSE and MAE to evaluate model performance.
Fig 12 represents a comparison of the predictive performance of various models during peak periods.The pilot results show that the average predicted error of our proposed model is the lowest during both peak hours.Specifically, during the morning peak, the average MAE of FF-STGCN is smaller than that of other benchmark models.In addition, both the MAE and MSE of FF-STGCN are better than those of the benchmarks during the evening peak.This indirectly proves that our proposed model outperforms the benchmark models during peak periods.

Ablation study
Comparative analysis spatio-temporal module.To validate the effect of different spatiotemporal modules on our model, various combinations of modules were fixed, namely the multi-scale spatio-temporal feature fusion module (FN), the flow-based feature learner (FFL), and the pattern-based feature learner (PFL).The results can be found in Table 2.And a visual representation can be seen in Fig 13.The analysis of the experimental results can be concluded as follows: • Each spatio-temporal module contributes to building accurately prediction results.As the combination of modules increases, the predicted error decreases.This result indicates that models considering multiple scales and features are superior to models considering only a single feature or scale.
• The combination model of FN+PFL or FN+FFL shows improvement over the model of single module FN.The consideration of flow or similarity bike usage patterns features were both beneficial in improving the prediction of bike-sharing demand.
• The combination model of FF+FFL outperforms FF+PFL.The experimental results indicate that bike-sharing demand and supply prediction is more sensitive to flow features than to similarity bike usage patterns features.
• When comparing FN+FFL+PFL model with FFL+PFL modules, the FN+FFL+PFL model showed better prediction performance.This demonstrates that incorporating multi-scale temporal features is beneficial in improving prediction results.

Comparative analysis metrics of bike usage patterns similarity.
To delve into understanding the metrics for quantifying the similarity of bike usage patterns between stations, the performance of FF-STGCN and two variants of FF-STGCN is evaluated in this study.The two variants are: one that utilizes DTW to quantify the similarity of bike usage patterns in the temporal dimension and another that incorporates the Pearson coefficient to quantify the similarity of bike usage patterns in the spatial dimension.We have named them FF-STGCN:P and FF-STGCN:D, respectively.In order to ensure the accuracy of our experimental results, we set the hyperparameters of the two variant models identical to those of the original model.
Table 3 represents the performance of two variants of FF-STGCN and FF-STGCN in 30-minute time slots based on two indicators (MAE and MSE).We find that the prediction error of the variants with the Pearson measure is lower than the variants with the DTW metric.The results indicate that for bike usage patterns, metrics based on the spatial dimension can more accurately quantify the similarity than those based on the temporal dimension.However.FF-STGCN:P achieves good prediction results, the performance of our proposed model is still better than it in all indicators.This further demonstrates that considering the similarity of bike usage patterns between stations based on both temporal and spatial dimensions can help improve the accuracy of bike-sharing demand prediction.
Fig 14 demonstrates the prediction error of two variants of FF-STGCN and FF-STGCN during the overall day.Compared with the two variants of FF-STGCN and FF-STGCN, our proposed model has better performance during daytime hours.The results show that our model has better prediction performance in time periods with high usage.In addition, it is shown that the spatio-temporal composite metric is an efficient method for selecting semantic neighborhoods involved in irregular convolution.

Conclusion
In this paper, we propose a usage pattern similarity based dual-network for bike-sharing demand prediction, called FF-STGCN, and evaluate it on BikeNY dataset.We compare FF-STGCN with five benchmark models to verify its effectiveness in a real data environment.

Fig 2 .Fig 3 .
Fig 2. Short-term and long-term flow matrices.https://doi.org/10.1371/journal.pone.0298684.g002 cost matrix D, find a path from the top-right to the bottom-left corner, where the sum of the values of the elements traversed is minimized.This is the warping path of time series X and Y, denoted as W(X, Y) = {w 1 , w 2 , . .., w m }, t − 1 � m � 2t − 2. Fig 4 illustrates the execution process of the Dynamic Time Warping (DTW) algorithm described above.The squares in the figure represent the distance cost between two elements of the example time series, while the lines in the path depict the warping path connecting the two example time series.

Fig 5
Fig 5 displays a heatmap illustrating the temporal similarity between stations over the time period from t − k to t − 1.Spatial similarity calculator.The geographical environment is a crucial spatial factor that is strongly correlated with user travel behavior and significantly influences bike-sharing demand prediction.In stations with similar geographical environment, people's travel times and destinations exhibit similarities.Hence, analyzing the similarity of the geographical

Fig 9 .
Fig 9. Irregular convolution.https://doi.org/10.1371/journal.pone.0298684.g009 However, as shown in Fig 10, our proposed model incurs higher time costs compared to time series model LSTM and certain graph convolutional networks (such as MSTGCN, MC_STGCN).In contrast to the LSTM model, our model introduces a network module designed to capture spatial dependencies, giving it a more complex network structure that requires additional time for parameter optimization during training.Meanwhile, unlike our model, which focuses on predicting the supply and demand of bike-sharing at individual stations, the MC_STGCN model only needs to predicate the overall demand for bike-sharing within a specific region, reducing data volume and computational complexity.Compared to the MSTGCN model, our proposed model considers both the spatio-temporal features of bikesharing usage patterns between stations and the spatio-temporal features of traffic between stations, resulting in an overall more complex network architecture and longer training time.Additionally, our model introduces a module for learning the similarity of user usage patterns, dynamically capturing the similarity between usage patterns at different stations.The use of the DTW algorithm and Pearson coefficient calculation in this module, however, increases the time complexity and computational costs of our model.

Fig 14 .
Fig 14.Comparative metrics of bike usage patterns similarity.(a) MAE (b) MSE.https://doi.org/10.1371/journal.pone.0298684.g014 , s j ) is a trip that borrows from the station s j , and comes back to another station except the station s j .|•|denotes the cardinality of a set.Definition 2 (Outflow matrices) At the t th time slot, we define the bike-sharing inflow matrices as O t :O t 1;1 is the quantity of borrowing from station s i and returning to station s j during the t th time slot.
) is the neighboring stations of s i in the graph.W k is learnable parameter.And Aggr (*) is the flow aggregator in our network which aggregate the high-dimensional spatio-temporal features from one's neighboring nodes.It is computed by: